perm filename CIRCUM[F83,JMC] blob sn#763061 filedate 1984-07-16 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00017 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	.<<circum[f83,jmc]>>
C00006 00003	.skip to column 1
C00014 00004	#. %3A typology of uses of non-monotonic reasoning%1
C00024 00005	#. %3Minimizing abnormality%1
C00026 00006	#. %3Whether birds can fly%1
C00040 00007	#. %3The unique names hypothesis%1
C00052 00008	#. %3Two examples of Raymond Reiter%1
C00056 00009	#. %3A More general treatment of an %2is-a%* hierarchy%1
C00060 00010	#. %3The blocks world%1
C00063 00011	#. %3An example of doing the circumscription.%1
C00068 00012	#. %3Prioritized circumscription%1
C00074 00013	#. %3General considerations and remarks%1
C00083 00014	%3Appendix A%1
C00094 00015	%3Appendix B%1
C00104 00016	#. %3Acknowledgments%1
C00105 00017	#. %3References:%1
C00109 ENDMK
C⊗;
.<<circum[f83,jmc]>>
.require "memo.pub[let,jmc]" source;
.at "q1" ⊂"%41%*"⊃
.at "q2" ⊂"%42%*"⊃
.at "qi" ⊂"%4i%*"⊃
.at "qqn" ⊂"%4n%*"⊃
.at "qqm" ⊂"%4m%*"⊃

.every heading({date},Applications of Circumscription,{page})
.every footing(,,JUST ANOTHER DRAFT)
.cb Applications of Circumscription to Formalizing Common Sense Knowledge

.cb "by John McCarthy, Stanford University"

.skip 5

Abstract: We present a new and more symmetric version of the
circumscription method of non-monotonic reasoning first described in
(McCarthy 1980) and some
applications to formalizing common sense knowledge.  The applications in
this paper are mostly based on minimizing the abnormality of different
aspects of various entities.  Included are non-monotonic treatments of ⊗is-a 
hierarchies, the unique names hypothesis, and the frame problem.
The new circumscription may be called %2formula circumscription%1 to
distinguish it from the previously defined %2domain circumscription%1 and
%2predicate circumscription%1.  A still more general formalism called
%2prioritized circumscription%1 is briefly explored.

	This draft contains only a preliminary treatment of prioritized
circumscription.
.skip 5

Descriptors: non-monotonic logic, common sense, artificial intelligence.

.skip 20

This version of circum[f83,jmc] pubbed at {time} on {date}.

.skip to column 1
.double space
#. %3Introduction and new definition of circumscription%1

	(McCarthy 1980) introduces the circumscription method of
non-monotonic reasoning and gives motivation, some mathematical properties
and some examples of its application.  While the present paper is
logically self-contained, motivation may require some acquaintance with
the earlier paper.  In particular we don't repeat its arguments about the
importance of non-monotonic reasoning in AI.  Also its examples are
instructive.

	Here we give a more symmetric definition of
circumscription and applications to the formal expression of common
sense facts.  Our long term goal (far from realized in the present paper)
 is to express these facts in a way that would
be suitable for inclusion in a general purpose database of common
sense facts.  We imagine this database to be used by AI programs
written after the initial preparation of the database.  It would
be best if the writers of these programs didn't have to
be familiar with how the common sense facts about particular
phenomena are expressed.
Thus
common sense knowledge must be represented in a way that is not
specific to a particular application.

	It turns out that many such common sense facts can be formalized
in a uniform way.  A single predicate ⊗ab, standing for "abnormal" is
circumscribed with certain other predicates and functions considered as
variables that can be constrained to achieve the circumscription subject
to the axioms.  So far this seems to cover the use of circumscription to
represent default rules.
On the other hand, it doesn't seem that circumscribing abnormality can
be used to cover many of the examples of (McCarthy 1980) in which we
want to limit a set to those objects that must be members.

.skip 1
#. %3A new version of circumscription.%1

.skip 1

%3Definition:%1 Let  %2A(P)%1 be a formula of second order logic, where ⊗P is
a tuple of some of the free predicate symbols in ⊗A(P).  
Let ⊗E(P,x)  be a wff in which
⊗P  and a tuple ⊗x of individual variables occur free.  The circumscription
of  ⊗E(P,x)  relative to  ⊗A(P)  is the formula  ⊗A'(P)  defined by

!!a1:	%2A(P) ∧ ∀P'.[A(P') ∧ [∀x.E(P',x) ⊃ E(P,x)] ⊃ [∀x.E(P',x) ≡ E(P,x)]].%1

[We are here writing ⊗A(P) instead of %2A(Pq1,...,Pqqn)%1 for brevity
and likewise writing ⊗E(P,x) instead of
%2E(Pq1,...,Pqqn,xq1,...,xqqm)%1.
Likewise the quantifier ⊗∀x stands for %2∀xq1...xqqm%1.
In principle ⊗A(P) may have embedded quantifiers, but we have no applications
yet that use this generality.  Circumscription is a kind of minimization,
and the predicate symbols in ⊗A(P) that are not in ⊗P itself act as
parameters in this minimization.  When we wish to mention
these other predicates we will write ⊗A(P;Q) and ⊗E(P;Q,x) where
⊗Q is a vector of predicate symbols which are not allowed to be varied.

.skip 1
	There are two differences between this and (McCarthy 1980).  First,
in that paper  ⊗E(P,x)  had the specific form  ⊗P(x).  Here we speak of
circumscribing a wff and call the method %2formula circumscription%1,
while there we could speak of circumscribing a predicate.
We can still speak of circumscribing the predicate ⊗P when ⊗E(P,x) has
the special form ⊗P(x).  Formula circumscription is more symmetric in that
all of the predicate symbols in ⊗P are regarded as variables, and a wff
is minimized, whereas the earlier form distinguishes one of the predicates
themselves for minimization.  However, formula circumscription is reducible to
predicate circumscription.

	Second, in ({eq a1}) we use an explicit quantifier for the predicate
variable  ⊗P' whereas in (McCarthy 1980), the formula was a schema.
One advantage of the present formalism is that now  ⊗A'(P)  is the
same kind of formula as ⊗A(P) and can be used as part of the axiom for
circumscribing some other wff.

Remark: The predicate symbols %2Pq1,...,Pqqn%1 will not usually be all the
predicate symbols occurring in ⊗A(P). 
The situation is analogous to a minimization problem in calculus
or calculus of variations.  Some quantity is being minimized.  Other
quantities are variables whose values are to be chosen so as to achieve
the minimum.  Still others are parameters.  They are not varied
in the minimization process, and therefore the minimum obtained and the
values of the minimizing variables depend
on the values of the parameters.  We will point out the variables and
the parameters in the subsequent examples.  When we want to be emphatic
we will write ⊗A(P;Q,x), where the ⊗P are variable and the ⊗Q are
parameters.

	In some of the literature, it has been supposed that non-monotonic
reasoning involves giving all predicates their minimum extension.  This
mistake has led to theorems about what reasoning cannot be
done that are irrelevant to AI and database theory,
because their premisses are too narrow.
.skip 2
#. %3A typology of uses of non-monotonic reasoning%1

	Each of the several papers that introduces a mode of non-monotonic
reasoning seems to have a particular application in mind.  Perhaps we are
looking at different parts of an elephant.  Therefore, before proceeding
to applications of circumscription I want to suggest a typology of uses of
non-monotonic reasoning.  The orientation is towards circumscription,
but I suppose the considerations apply to other formalisms as well.

Non-monotonic reasoning has several uses.

1. As a communication convention.  Suppose  A  tells  B  about
a situation involving a bird.  If the bird cannot fly, and this
is relevant, then  A  must
say so.  Whereas if the bird can fly, there is
no requirement to mention the fact.  For example, if I hire you
to build me a bird cage and you don't put a top on it, I can
get out of paying for it even if you tell the judge that I never
said my bird could fly.  However, if I complain that you wasted
money by putting a top on a cage I intended for a penguin, the judge
will agree with you that if the bird couldn't fly I should have
said so.

2. As a database or information storage convention.  It may be a
convention of a particular database that certain predicates have
their minimal extension.  This generalizes the closed world
assumption.  When a database makes the closed world assumption
for all predicates it is reasonable to imbed this fact in the
programs that use the database.
However, when only some predicates are to be minimized,
we need to say which ones by appropriate
sentences of the database, perhaps as a preamble
to the collection of ground sentences that usually constitute the
main content.

	Neither 1 nor 2 requires that most birds can fly.
Should it happen that most birds that are subject to the communication
or about which information is requested from the data base cannot fly, the
convention may lead to inefficiency but not incorrectness.

3. As a rule of conjecture.  This use was emphasized
in (McCarthy 1980).  The circumscriptions may be regarded as expressions of some
probabilistic notions such as "most birds can fly" or they may be
expressions of standard cases.  Thus it is simple to conjecture that
there are no relevant present
material objects other
than those whose presence can be inferred.  It is also
a simple conjecture that a tool asserted to be present is usable
for its normal function.  Such conjectures sometimes conflict, but there
is nothing wrong with having incompatible conjectures on hand.  Besides
the possibility of deciding that one is correct and the other wrong, it
is possible to use one for generating possible exceptions to the other.

4. As a representation of a policy.  The example is Doyle's "The meeting
will be on Wednesday unless another decision is explicitly made".
Again probabilities are not involved.

5. As a very streamlined expression of probabilistic information when
numerical probabilities, especially conditional probabilities, are
unobtainable.  Since circumscription doesn't provide numerical probabilities,
its probabilistic interpretation involves
probabilities that are either infinitesimal, within an
infinitesimal of one, or intermediate - without any discrimination
among the intermediate values.  The circumscriptions give conditional
probabilities.  Thus we may treat the probability that a bird
can't fly as an infinitesimal.  However, if the rare
event occurs that the bird is a penguin, then the conditional probability that
it can fly is infinitesimal, but we may hear of some rare condition
that would allow it to fly after all.

	Why don't we use finite
probabilities combined by the usual laws?  That would be fine
if we had the numbers, but circumscription is usable when we can't
get the numbers or find their use inconvenient.  Note that the
general probability that a bird can fly may be irrelevant, because
we are interested in the facts that influence our opinion about
whether a particular bird can fly in a particular situation.

	Moreover, the use of probabilities is normally considered
to require the definition of a sample space, i.e. the space of all
possibilities.  Circumscription allows one to conjecture that the
cases we know about are all that there are.  However, when additional
cases are found, the axioms don't have to be changed.  Thus there
is no fixed space of all possibilities.

	Notice also that circumscription does not provide for
weighing evidence; it is appropriate
when the information permits snap decisions.  However, many
cases nominally treated in terms of weighing information are in fact
cases in which the weights are such that circumscription and other
defaults work better.

6. Auto-epistemic reasoning.  "If I had an elder brother, I'd know it".
This has been studied by R. Moore.  Perhaps it can be handled by
circumscription.

7. Both common sense physics and common sense psychology use non-monotonic
rules.  An object will continue in a straight line if nothing interferes
with it.  A person will eat when hungry unless something prevents it.
Such rules are open ended about what might prevent the expected behavior,
and this is required, because we are always encountering unexpected
phenomena that modify the operation of our rules.  Science, as distinct
from common sense, tries to work with exceptionless rules.  However,
this means that common sense reasoning has to decide when a scientific
model is applicable, i.e. that there are no important phenomena not
taken into account by the theories being used and the model of the
particular phenomena.

	Seven different uses for non-monotonic reasoning seem
too many, so perhaps we can condense later.

.skip 2
#. %3Minimizing abnormality%1

	Many people have proposed representing facts about what is
"normally" the case.  One problem is that every object is abnormal
in some way, and we want to allow some aspects of the object to be
abnormal and still assume the normality of the rest.
We do this with a predicate  ⊗ab standing for "abnormal".  We circumscribe
%2ab_z%1.  The argument of ⊗ab will be some aspect of the entities
involved.  Some aspects can be abnormal without affecting others.  The
aspects themselves are abstract entities, and their unintuitiveness is
somewhat a blemish on the theory.

	There are many applications.

.skip 2
#. %3Whether birds can fly%1

	Marvin Minsky (1982) offered expressing the facts and
non-monotonic reasoning concerning the ability of birds to fly
as a challenge to advocates of formal systems based on mathematical
logic.

	There are many ways of non-monotonically axiomatizing the
facts about which birds can fly.  The following set of axioms using ⊗ab seems
to me quite straightforward.

!!a4a:	%2∀x.¬ab aspect1 x ⊃ ¬flies x%1.

Unless an object is abnormal in ⊗aspect1, it can't fly.  (We're using a
convention that parentheses may be omitted for functions and predicates
of one argument, so that ({eq a4a}) is the same as %2∀x.(¬ab(aspect1(x)
 ⊃ ¬flies(x))%1.)

	Note that it wouldn't work to write %2ab x%1 instead of %2ab aspect1 x%1,
because we don't want a bird that is abnormal with respect to its ability
to fly to be automatically abnormal in other respects.  Using aspects limits
the effects of proofs of abnormality.

!!a5:	%2∀x.bird x ⊃ ab aspect1 x%1.

A bird is abnormal in ⊗aspect1, so ({eq a4a}) can't be used to show it
can't fly.  If ({eq a5}) were omitted, when we did the circumscription we
would only be able to infer a disjunction.  Either a bird is abnormal in
%2aspect1%1 or it can fly unless it is abnormal in ⊗aspect2.  ({eq a5})
establishes expresses our preference for inferring that a bird is abnormal
in ⊗aspect1 rather than ⊗aspect2.  We call ({eq a5}) a %2cancellation of
inheritance%1 axiom.  We will see more of them.

!!a6:	%2∀x.bird x ∧ ¬ab aspect2 x ⊃ flies x%1.

Unless a bird is abnormal in ⊗aspect2, it can fly.

!!a7:	%2∀x.ostrich x ⊃ ab aspect2 x%1.

Ostriches are abnormal in ⊗aspect2.  This doesn't say that an ostrich cannot
fly - merely that ({eq a6}) can't be used to infer that it does.  ({eq a7})
is another cancellation of inheritance axiom.

!!a8:	%2∀x.penguin x ⊃ ab aspect2 x%1.

Penguins are also abnormal in ⊗aspect2. 

!!a9:	%2∀x.ostrich x ∧ ¬ab aspect3 x ⊃ ¬flies x%1.

!!a10:	%2∀x.penguin x ∧¬ab aspect4 x ⊃ ¬flies x%1.

Normally ostriches and penguins can't fly.  However, there is
an out.  ({eq a9}) and ({eq a10}) provide that under unspecified conditions,
an ostrich or penguin might fly after all.  If we give no
such conditions, we will conclude that an ostrich or penguin
can't fly.  Additional objects that can fly may be specified.
Each needs two axioms.  The first says that it is abnormal in
⊗aspect1 and prevents ({eq a4a}) from being used to say that it can't
fly.  The second provides that it can fly unless it is abnormal
in yet another way.  Additional non-flying birds can also be
provided for at a cost of two axioms per kind.

	We haven't yet said that ostriches and penguins are birds,
so let's do that and throw in that canaries are birds also.

!!a11:	%2∀x.ostrich x ⊃ bird x%1

!!a12:	%2∀x.penguin x ⊃ bird x%1

!!a13:	%2∀x.canary x ∧ ¬ab aspect5 x ⊃ bird x%1.

We threw in ⊗aspect5 just in case one wanted to use the term canary
in the sense of a 1930s gangster movie.

Asserting that ostriches, penguins and canaries are birds will help
inherit other properties from the class of birds.  For example, we have

!!a14:	%2∀x. bird x ∧ ¬ab aspect6 x ⊃ feathered x%1.

So far there is nothing to prevent ostriches, penguins and canaries
from overlapping.  We could write disjointness axioms like

!!a15: %2∀x. ¬ostrich x ∨ ¬penguin x%1,

but we require  %2n%52%1  of them if we have  ⊗n  species.
It is more efficient to write axioms like

!!a15a:	%2∀x.ostrich x ⊃ species x = 'ostrich%1,

which makes the ⊗n species disjoint with only ⊗n axioms assuming
that the distinctness of the names is apparent to the reasoner.
This problem  is like the unique names problem.

	If these are the only facts to be taken into account, we
must somehow specify that what can fly is to be determined
by circumscribing the wff ⊗ab_z using  ⊗ab and ⊗flies as variables.
Why exactly these?  If ⊗ab were not taken as variable, ⊗ab_z couldn't
vary either, and the minimization problem would go away.  Since
the purpose of the axiom set is to describe what flies, the predicate
⊗flies must be varied also.  Suppose we contemplate taking ⊗bird as
variable also.  In the first place, this violates an intuition that
deciding what flies follows deciding what is a bird in the common
sense situations we want to cover.  Secondly, if we use exactly the
above axioms and admit bird as a variable, we will further conclude
that the only birds are penguins, canaries and ostriches.  Namely,
for these entities something has to be abnormal, and therefore minimizing
⊗ab_z will involve making as few entities as possible penguins, canaries
and ostriches.  If we also admit ⊗penguin, ⊗ostrich, and ⊗canary as
variable, we will succeed in making ⊗ab_z always false, and there will be
no birds at all.

	However, if the same circumscriptions are done with additional
axioms like ⊗canary_Tweety and ⊗ostrich_Joe, we will get the
expected result that Tweety can fly and Joe cannot even if all the
above are variable.

	While this works it may be more straightforward, and
therefore less likely to lead to subsequent trouble, to circumscribe
birds, ostriches and penguins with axioms like

!!a2:	%2∀x.¬ab aspect8 x ⊃ ¬bird x%1, etc.

	We have not yet specified how a program will know what to
circumscribe.  One extreme is to build it into the program, but
this is contrary to the declarative spirit.  However, a statement
of what to circumscribe isn't just a sentence of the language
because of its non-monotonic character.  My present, admittedly
somewhat unsatisfactory, idea is to include some sort of metamathematical
statement like

!!a16:	%2circumsribe(ab z; ab, flies, bird, ostrich, penguin)%1

in a "policy" database available to the program.  ({eq a16}) is
intended to mean that ⊗ab_z is to be circumscribed with ⊗ab, ⊗flies, 
⊗bird, ⊗ostrich and ⊗penguin taken as variable.  Explicitly listing
the variables makes adding new kinds awkward, since they will have
to be mentioned in the ⊗circumscribe statement.

.if false then begin
!!a2a1:	%2∀ab flies.flying-facts(ab,flies) ≡
.begin nofill
∂8∧ (∀x.¬ab aspect1 x ⊃ ¬flies x)
∂8∧ (∀x.bird x ⊃ ab aspect1 x)
∂8∧ (∀x.bird x ∧ ¬ab aspect2 x ⊃ flies x)
∂8∧ (∀x.ostrich x ⊃ ab aspect2 x)
∂8∧ (∀x.penguin x ⊃ ab aspect2 x)
∂8∧ (∀x.ostrich x ∧ ¬ab aspect3 x ⊃ ¬flies x)
∂8∧ (∀x.penguin x ∧¬ab aspect4 x ⊃ ¬flies x)
∂8∧ (∀x.ostrich x ⊃ bird x)
∂8∧ (∀x.penguin x ⊃ bird x)
∂8∧ (∀x.canary x ∧ ¬ab aspect5 x ⊃ bird x%1).
.end
.skip 1
	({eq a2}) is the circumscription, and ({eq a2a1}) defines the
predicate %2flying-facts%1 as the conjunction of axioms
({eq a4a}...{eq a13}).  This ties down precisely
what is to be varied.  However, it complicates the nonmonotonicity,
because new facts to be included in the circumscription go within
the quantifier %2∀ab_flies%1 of ({eq a2a1}).  This is not in accordance
with what was previously advertised about how nonmonotonic axioms systems
work.  It doesn't offer any additional technical difficulties, however.

.end
.if false then begin
Therefore, we write axioms

!!a16:	%2∀x. ostrich x ≡ belongs(x,'ostrich)%1

and similarly for penguins, canaries and birds.
We can now write

!!a17:	%2∀n1 n2. n1 ≠ n2 ∧ ¬ab aspect8(n1,n2) ⊃ ∀x.¬(belongs(x,n1) ∧ belongs(x,n2)%1.

This doesn't do everything we want, because we have allowed a canary  ⊗x 
two choices.  It can be a bird avoiding %2ab_aspect5_x%1 or it can be
a non-bird avoiding %2aspect8('canary,'bird)%1.  We want the former to
have priority and can do it simply by writing

!!a18:	%2ab aspect8('canary,bird)%1.

	In what follows we will need uniqueness axioms for the aspects,
and they are a nuisance to write.  They could be written

!!a18a:	%2∀n1 n2 x1 x2.aspect(n1,x1) = aspect(n2,x2) ≡ n1 = n2 ∧ x1 = x2%1,

if we make ⊗n a parameter, i.e. write ⊗aspect(2,x) instead of ⊗aspect2_x, but
even this doesn't to everything we need, because it doesn't cover aspects
that take more than one argument.  Aspects with many arguments can be reduced to
one argument aspects by assuming them to operate on tuples.  For this paper,
we won't make this change of notation and merely postulate that all aspects
are distinct objects, assuming as many axioms as necessary.
.end
.skip 1
#. %3The unique names hypothesis%1

	Raymond Reiter (1980b) introduced the phrase "unique names hypothesis"
for the assumption each object has a unique name, i.e.
 that distinct names denote distinct objects.  We want
to treat this non-monotonically.  Namely, we want a wff that picks out
those models of our initial assumptions maximize the inequality of
the denotations of constant symbols.
While we're at it, we might as well try for something
stronger.  We want to maximize the extent to which distinct terms designate
distinct objects.  When there is a unique model of the axioms that maximizes
distinctness, we can put it more simply; two terms denote distinct objects
unless the axioms force them to denote the same.
  If we are even more
fortunate, as we are in the examples to be given, we can say that two
terms denote distinct objects unless their equality is provable.

	We don't know a completely satisfactory way of doing this.
Suppose that we have a language ⊗L and a theory ⊗T 
consisting of the consequences of a formula ⊗A.  
It would be most pleasant if we could just circumscribe equality,
but as Reiter and Etherington (to be published) point out, this doesn't
work, and nothing similar works.  We could hope to circumscribe
some other formula of the ⊗L, but this doesn't seem
to work either.  Failing that, we could hope for some other second
order formula taken from ⊗L that would express the unique names
hypothesis, but we don't presently see how to do it.

	Our solution involves extending the language by introducing
the names themselves as the only objects.  All assertions about objects
are expressed as assertions about the names.

	We suppose our theory is such
that the names themselves
 are all provably distinct.  There are several ways of doing
this.  Let the names be ⊗n1, ⊗n2, etc.  The simplest solution is
to have an axiom %2ni_≠nj%1 for each pair of distinct names.  This
requires a number of axioms proportional to the square of the number
of names, which is sometimes objectionable.  The next solution involves
introducing an arbitrary ordering on the names.  We have special
axioms %2n1_<_n2, n2_<_n3, n3_<_n4%1, etc. and the general axioms
%2∀x_y.x_<_y_⊃_x_≠_y%1 and %2∀x_y_z._x_<_y_∧_y_<_z_⊃_x_<_z%1.  This
makes the number of axioms proportional to the number of names.
A third possibility involves mapping the names onto integers with
axioms like %2index_n1_=_1, index_n2_=_2%1, etc. and using a theory
of the integers that provides for their distinctness.  The fourth
possibility involves using string constants for the names and
"attaching" to equality in the language a subroutine that computes
whether two strings are equal.  If our names were quoted symbols as
in LISP, this amounts to having %2'a_≠_'b%1 and all its countable
infinity of analogs as axioms.  Each of these devices is useful
in appropriate circumstances.

	From the point of view of mathematical logic, there is no harm in
having an infinity of such axioms.  From the computational point of view
of a theorem proving or problem solving program, we merely suppose that we
rely on the computer to generate the assertion that two names are distinct
whenever this is required, since a subroutine can easily tell whether two
strings are the same.

	Besides axiomatizing the distinctness of the constants, we also
want to axiomatize the distinctness of terms.  This may be accomplished
by providing for each function two axioms.  Letting ⊗foo be a function
of two arguments we postulate

!!n1:	%2∀x1 x2 y1 y2.foo(x1,y1) = foo(x2,y2) ⊃ x1 = x2 ∧ y1 =y2%1

and

!!n2:	%2∀x y.fname foo(x,y) = 'foo%1.

The first axiom ensures that unless the arguments of ⊗foo are identical,
its values are distinct.  The second ensures that the values of ⊗foo 
are distinct from the values of any other function or any constant,
assuming that we refrain from naming any constant ⊗'foo. 

	These axioms amount to making our domain isomorphic to
the Herbrand universe of the language.

	Now that the names are guaranteed distinct, what about the
objects they denote?  We introduce a
predicate ⊗e(x,y) and axiomatize it to be an equivalence relation.
Its intended interpretation is that the names ⊗x and ⊗y denote the same
object.  We then formulate all our usual axioms in terms of names
rather than in terms of objects.  Thus ⊗on(n1,n2) means that the object
named by ⊗n1 is on the object named by ⊗n2, and ⊗bird_x means that the
name ⊗x denotes a bird.  We add axioms of substitutivity for ⊗e with
regard to those predicates and functions
 that are translates of predicates referring
to objects rather than predicates on the names themselves.  Thus we
may have axioms

!!a4:	%2∀n1 n2 n1' n2'.e(n1,n1') ∧ e(n2,n2') ⊃ (on(n1,n2) ≡ on(n1',n2')%1,

and

!!n3: %2∀x1 x2 y1 y2.e(x1,x2) ∧ e(y1,y2) ⊃ e(foo(x1,y1),foo(x2,y2))%1.

	If for some class ⊗C of names, we wish to assert the unique
names hypothesis, we simply use an axiom like

!!a4c:	%2∀n1 n2.n1 ε C ∧ n2 ε C ⊃ (e(n1,n2) ≡ n1 = n2)%1.

	However, we often want only to assume that distinct names denote
distinct objects when this doesn't contradict our other assumptions.
In general, our axioms won't permit making all names distinct simultaneously,
and there will be several models with maximally distinct objects.  The
simplest example is obtained by circumscribing ⊗e(x,y) while adhering
to the axiom

!!a4b:	%2e(n1,n2) ∨ e(n1,n3)%1

where ⊗n1, ⊗n2, and ⊗n3 are distinct names.  There will then be two
models, one satisfying ⊗e(n1,n2)_∧_¬e(n1,n3) and the other satisfying
⊗¬e(n1,n2)_∧_e(n1,n3). 

	Thus circumscribing ⊗e(x,y) maximizes uniqueness of names.
If we only want unique names for some class ⊗C of names, then we
circumscribe the formula %2x_ε_C_∧_y_ε_C_⊃_e(x,y)%1.
  An
example of such a circumscription is given in Appendix B.  However, there
seems to be a price.  Part of the price is admitting names as objects.
Another part is admitting the predicate ⊗e(x,y) which is substitutive
for predicates and functions
 of names that really are about the objects denoted by
the names.  ⊗e(x,y) is not to be taken as substitutive for predicates
on names that aren't about the objects.  Of these our only present example
is equality.  Thus we don't have

*	%2∀n1 n2 n1' n2'.e(n1,n1') ∧ e(n2,n2') ⊃ (n1 = n2) ≡ n1' = n2')%1.

	The awkward part of the price is that we must refrain from any
functions whose values are the objects themselves rather than names.
They would spoil the circumscription by not allowing us to infer the
distinctness of the objects denoted by distinct names.  Actually, we
can allow them provided we don't include the axioms involving them
in the circumscription.  Unfortunately, this spoils the other property
of circumscription that lets us take any facts into account.

	The examples of the use of circumscription in AI in the rest
of the paper don't interpret the variables as merely ranging over
names.  Therefore, they are incompatible with getting unique names
by circumscription as described in this section.  Presumably it wouldn't
be very difficult to revise those axioms for compatibility with the
present approach to unique names.

.skip 1
#. %3Two examples of Raymond Reiter%1

	Reiter asks about representing, "Quakers are normally pacifists
and Republicans are normally non-pacifists.  How about Nixon, who is
both a Quaker and a Republican?"  Systems of
non-monotonic reasoning that use non-provability as a basis for
inferring negation will infer that Nixon is neither a pacifist
nor a non-pacifist.  Combining these conclusions with the original
premiss leads to a contradiction.

We use

!!c1:	%2∀x.quaker x ∧ ¬ab aspect1 x ⊃ pacifist x%1,

!!c2:	%2∀x.republican x ∧ ¬ab aspect2 x ⊃ ¬ pacifist x%1

and

!!c3:	%2quaker Nixon ∧ republican Nixon%1.

	When we circumscribe ⊗ab_z using these three sentences as
⊗A(ab,pacifist), we will only be able to conclude that Nixon is either abnormal
in ⊗aspect1 or in ⊗aspect2, and we will not be able to say whether he
is a pacifist.  Of course, this is the same conclusion as would be reached
without circumscription.  The point is merely that we avoid contradiction.

	Reiter's second example is that a person normally lives in the same
city as his wife and in the same city as his employer.  But A's wife
lives in Vancouver and A's employer is in Toronto.
We write

!!c4:	%2∀x.¬ab aspect1 x ⊃ city x = city wife x%1

and

!!c5:	%2∀x.¬ab aspect2 x ⊃ city x = city employer x%1.

If we have

!!c6:	%2city wife A = Vancouver ∧ city employer A = Toronto ∧ Toronto ≠ Vancouver%1,

we will again only be able to conclude that A lives either in Toronto
or Vancouver.  In this circumscription, the function ⊗city must be taken
as variable.  This might be considered not entirely satisfactory.  If one knows
that a person either doesn't live in the same city as his wife or doesn't live
in the same city as his employer, then there is an increased probability that
he doesn't live in the same city as either.  A system that did reasoning of
this kind would seem to require a larger body of facts and perhaps more
explicitly metamathematical reasoning.  Not knowing how to do that, we might
want to use %2aspect1_x%1 in both ({eq c4}) and ({eq c5}).  Then once we know
we would conclude nothing about his city once we knew that he wasn't in the
same city as either.
λ¬~.skip 2
#. %3A More general treatment of an %2is-a%* hierarchy%1

	The bird example works fine when a fixed %2is-a%1 hierarchy
is in question.  However, our writing the inheritance cancellation axioms
depended on knowing exactly from what higher level the properties were
inherited.  This doesn't correspond to my intuition of how we humans
represent inheritance.  It would seem rather that when we say that
birds can fly, we don't necessarily have in mind that an
inheritance of inability to fly from things in general is being cancelled.
We can formulate inheritance of properties in a more general way provided
we reify the properties.  Presumably there are many ways of doing this,
but here's one that seems to work.

	The first order variables of our theory range over classes of
objects (denoted by  ⊗c  with numerical suffixes), properties (denoted
by  ⊗p) and objects (denoted by ⊗x).  We don't identify our classes
with sets (or with the classes of GB set theory).  In particular, we don't
assume extensionality.  We have several predicates:

	⊗ordinarily(c,p) means that objects of class ⊗c ordinarily have
property ⊗p.  ⊗c1_≤_c2 means that class ⊗c1 ordinarily inherits from
class ⊗c2.  We assume that this relation is transitive.  ⊗in(x,c) means
that the object ⊗x is in class ⊗c.  ⊗ap(p,x) means that property ⊗p 
applies to object ⊗x.  Our axioms are

!!d0:	%2∀c1 c2 c3. c1 ≤ c2 ∧ c2 ≤ c3 ⊃ c1 ≤ c3%1,

!!d1:	%2∀c1 c2 p.ordinarily(c2,p) ∧ c1 ≤ c2 ∧ ¬ab aspect1(c1,c2,p) 
⊃ ordinarily(c1,p)%1,

!!d2:	%2∀c1 c2 c3 p.c1 ≤ c2 ∧ c2 ≤ c3 ∧ ordinarily(c2, not p) 
⊃ ab aspect1(c1,c3,p)%1,

!!d3:	%2∀x c p.in(x,c) ∧ ordinarily(c,p) ∧ ¬ab aspect3(x,c,p) ⊃ ap(p,x)%1,

!!d4:	%2∀x c1 c2 p.in(x,c1) ∧ c1 ≤ c2 ∧ ordinarily(c1,not p) 
⊃ ab aspect3(x,c2,p)%1.

({eq d0}) is the afore-mentioned transitivity of ≤.  ({eq d1}) says
that properties that ordinarily hold for a class are inherited unless
something is abnormal.  ({eq d2}) cancels the inheritance if there
is an intermediate class for which the property ordinarily doesn't
hold.  ({eq d3}) says that properties which ordinarily hold actually
hold for elements of the class unless something is abnormal.  ({eq d4})
cancels the effect of ({eq d3}) when there is an intermediate class
for which the negation of the property ordinarily holds.  Notice that
this reification of properties seems to require imitation boolean
operators.  Such operators are discussed in (McCarthy 1979).

.skip 2
#. %3The blocks world%1

	The following set of "situation calculus" axioms solves the
frame problem for a blocks world in which blocks can be moved and painted.
Here ⊗result(e,s) denotes the situation that results when event ⊗e occurs
in situation ⊗s.  The formalism is approximately that of (McCarthy and Hayes
1969).

!!b1:	%2∀x e s.¬ab aspect1(x,e,s) ⊃ location(x,result(e,s)) = location(x,s)%1.

!!b2:	%2∀x e s.¬ab aspect2(x,e,s) ⊃ color(x,result(e,s)) = color(x,s)%1.

Objects change their locations and colors only for a reason.

!!b3:	%2∀x l s.ab aspect1(x,move(x,l),s)%1

and

	%2∀x l s.¬ab aspect3(x,l,s) ⊃ location(x,result(move(x,l),s)) = l%1.

!!b4:	%2∀x c s.ab aspect2(x,paint(x,c),s)%1

and

	%2∀x c s.¬ab aspect4(x,c,s) ⊃ color(x,result(paint(x,c),s)) = c%1.

Objects change their locations when moved and their colors when painted.

!!b5:	%2∀x l s.¬clear(top x,s) ∨ ¬clear(l,s) ∨ tooheavy x ∨ l = top x
 ⊃ ab aspect3(x,move(x,l),s)%1.

This prevents the
rule ({eq b3}) from being used to infer that an object will move
if it isn't clear or to a destination that isn't clear
or if the object is too heavy.  An object also cannot be moved to
its own top.

!!b6:	%2∀l s.clear(l,s) ≡ ¬∃x.(¬trivial x ∧ location(x,s) = l)%1.

A location is clear if all the objects there are trivial, e.g. a speck of dust.

!!b7:	%2∀x.¬ab aspect7 x ⊃ ¬trivial x%1.

Trivial objects are abnormal in %2aspect7%1.
.skip 2
#. %3An example of doing the circumscription.%1

	In order to keep the example short we will take into account
only the following facts from the earlier section on flying.

{eq a4a})	%2∀x.¬ab aspect1 x ⊃ ¬flies x%1.

{eq a5})	%2∀x.bird x ⊃ ab aspect1 x%1.

{eq a6})	%2∀x.bird x ∧ ¬ab aspect2 x ⊃ flies x%1.

{eq a7})	%2∀x.ostrich x ⊃ ab aspect2 x%1.

{eq a9})	%2∀x.ostrich x ∧ ¬ab aspect3 x ⊃ ¬flies x%1.

	Their conjunction is taken as %2A(ab,flies)%1.  This means
that what entities satisfy ⊗ab and what entities satisfy ⊗flies are
to be verified so as to minimize ⊗ab_z.  Which objects are birds
and ostriches are parameters rather than variables, i.e.
what objects are birds is considered given.

	We also need an axiom that asserts that the aspects are different.
Here is a straightforward version that would be rather long were there
more than three aspects.

.begin nofill

%2(∀x y.¬(aspect1 x = aspect2 y))

∧ (∀x y.¬(aspect1 x = aspect3 y))

∧ (∀x y.¬(aspect2 x = aspect3 y))

∧ (∀x y.aspect1 x = aspect1 y ≡ x =y)

∧ (∀x y.aspect2 x = aspect2 y ≡ x =y)

∧ (∀x y.aspect3 x = aspect3 y ≡ x =y).%1
.end

The circumscription formula ⊗A'(ab,flies) is then

!!a19:	%2A(ab,flies)
	 ∧ ∀ab' flies'.[A(ab',flies') ∧ [∀x. ab' x ⊃ ab x] ⊃ [∀x. ab x ≡ ab' x]]%1,

which spelled out becomes

!!a20:	%2[∀x.¬ab aspect1 x ⊃ ¬flies x] ∧ [∀x.bird x ⊃ ab aspect1 x]
∧ [∀x.bird x ∧ ¬ab aspect2 x ⊃ flies x] ∧ [∀x.ostrich x ⊃ ab aspect2 x]
∧ [∀x.ostrich x ∧ ¬ab aspect3 x ⊃ ¬flies x]
  ∧ ∀ab' flies'.[[∀x.¬ab' aspect1 x ⊃ ¬flies' x] ∧ [∀x.bird x ⊃ ab' aspect1 x]
∧ [∀x.bird x ∧ ¬ab' aspect2 x ⊃ flies' x] ∧ [∀x.ostrich x ⊃ ab' aspect2 x]
∧ [∀x.ostrich x ∧ ¬ab' aspect3 x ⊃ ¬flies' x]
  ∧ [∀z.ab' z ⊃ ab z]
   ⊃ [∀z.ab z ≡ ab' z]]%1.

	⊗A(ab,flies) is guaranteed to be true, because it is part of what
is assumed in our common sense data base.  Therefore ({eq a20}) reduces
to

!!a20a:	%2∀ab' flies'.[[∀x.¬ab' aspect1 x ⊃ ¬flies' x] ∧ [∀x.bird x ⊃ ab' aspect1 x]
∧ [∀x.bird x ∧ ¬ab' aspect2 x ⊃ flies' x] ∧ [∀x.ostrich x ⊃ ab' aspect2 x]
∧ [∀x.ostrich x ∧ ¬ab' aspect3 x ⊃ ¬flies' x]
  ∧ [∀z.ab' z ⊃ ab z]
   ⊃ [∀z.ab z ≡ ab' z]]%1.

	Our objective is now to make suitable substitutions for ⊗ab' and
⊗flies' so that all the terms preceding the ⊃ ({eq a20a}) will be true,
and right side will determine ⊗ab.  The axiom ⊗A(ab,flies) will then
determine ⊗flies, i.e. we will know what the fliers are.
⊗flies' is easy, because we need only apply wishful
thinking; we want the fliers to be just those birds that aren't ostriches.
Therefore, we put

!!a21:	%2flies' x ≡ bird x ∧ ¬ostrich x%1.

	⊗ab' isn't really much more difficult, but there is a notational
problem.  We define

!!a22:	%2ab' z ≡ [∃x.bird x ∧ z = aspect1 x] ∨ [∃x.ostrich x ∧ z = aspect2 x]%1,

which covers the cases we want to be abnormal.

	Appendix A contains a complete proof as accepted by Jussi
Ketonen's (1984) interactive theorem prover EKL.  EKL uses the theory of
types and therefore has no problem with the second order logic required by
circumscription.
.skip 2
#. %3Prioritized circumscription%1

	An alternate way of introducing formula circumscription is by
means of an ordering on tuples of predicates satisfying an axiom.  We
define %2P_≤_P'%1 by

!!k10:	%2∀P P'.P ≤ P' ≡ ∀x.E(P,x) ⊃ E(P',x)%1.

That ⊗P0 is a relative minimum in this ordering is expressed by

!!k11:	%2∀P.P ≤ P0 ⊃ P = P0%1,

where equality is interpreted extensionally, i.e. we have

!!k12:	%2∀P P'.P = P' ≡ (∀x.E(P,x) ≡ E(P',x))%1.

Assuming that we look for a minimum among predicates ⊗P satisfying ⊗A(P), 
then ({eq k10}) expands to precisely to the circumscription formula ({eq a1}).
In some earlier expositions of circumscription this ordering
approach was used, and Vladimir Lifschitz in a recent seminar advocated
returning to it as a more fundamental and understandable concept.

	I'm beginning to think he's right about it being more
understandable, but there seems to be a more fundamental reason for using
it.  Namely, certain common sense axiomatizations are easier to formalize
if we use a new kind of ordering, and circumscription based on this kind
of ordering doesn't seem to reduce to ordinary formula circumscription.

	We call it %3prioritized circumscription%1.

	Suppose we write some bird axioms in the form

!!k1:	%2∀x.¬ab aspect1 x ⊃ ¬flies x%1

and

!!k2:	%2∀x.bird x ∧ ¬ab aspect2 x ⊃ ab aspect1 x%1.

	The intent is clear.  The goal is that being a bird and
not abnormal in ⊗aspect2 prevents
the application of ({eq k1}).  However, circumscribing ⊗ab_z with
the conjunction of ({eq k1}) and ({eq k2}) as ⊗A(ab) doesn't have this effect,
because ({eq k2}) is equivalent to

!!k3:	%2∀x.bird x ⊃ ab aspect1 x ∨ ab aspect2 x%1,

and there is no indication that one would prefer to have
⊗aspect1_x abnormal than to have ⊗aspect2_x abnormal.  Circumscription
then results in a disjunction which is not wanted in this case.
The need to avoid this disjunction is
why the axioms were written as they are in section 4.

	However, by using a new kind of ordering we can leave ({eq k1}) and
({eq k2}) as is, and still get the desired effect.

	We define two orderings on ⊗ab predicates, namely

!!k4:	%2∀ab ab'.ab ≤q1 ab' ≡ ∀x.ab aspect1 x ⊃ ab' aspect1 x%1

and

!!k5:	%2∀ab ab'.ab ≤q2 ab' ≡ ∀x.ab aspect2 x ⊃ ab' aspect2 x%1.

We then combine these orderings lexicographically giving %2≤q2%1
priority over %2≤q1%1 getting

!!k6:	%2∀ab ab'.ab ≤%41<2%* ab' ≡ ab ≤q2 ab' ∨ ab =q2 ab' ∧ ab ≤q1 ab'%1.

	Choosing ⊗ab0 so as to minimize this ordering yields the
result that exactly birds can fly.  However, if we add

!!k7:	%2∀x.ostrich x ⊃ ab aspect2 x%1,

we'll get that ostriches (whether or not ostriches are birds)
don't fly without further axioms.  If we use

!!k8:	%2∀x.ostrich x ∧ ¬ab aspect3 x ⊃ ab aspect2 x%1

instead of ({eq k7}), we'll have to revise our notion of ordering
to put minimizing %2ab_aspect3_x%1 at higher priority than minimizing
⊗aspect2_x and %2a fortiori%1 at higher priority than minimizing
⊗aspect1. 

	This suggests providing a partial ordering on aspects
giving their priorities and providing axioms that permit deducing
the ordering on ⊗ab from the sentences that describe the ordering
relations.  We haven't been able to work this out yet.  It would
apparently be a considerable elaboration of the formalism.

	I am hopeful that %2prioritized circumscription%1 will
turn out to be the most natural and powerful variant.

.skip 2
#. %3General considerations and remarks%1

	1. Suppose we have a data base of facts axiomatized by a formalism
involving the predicate ⊗ab.  In connection with a particular problem, a
program takes a subcollection of these facts together with the specific
facts of the problem and then circumscribes ⊗ab_z.  We get a second order
formula, and in general, as the natural number example of (McCarthy 1980)
shows, this formula is not equivalent to any first order formula.
However, many common sense domains are axiomatizable in such a way that
the circumscription is equivalent to a first order formula.  In this case
we call the circumscription collapsible.
  For example,
Vladimir Lifschitz (unpublished 1983 and 1984)
has shown that this is true if the axioms are from a certain class of
universal formulas and there are no functions.  This can presumably be
extended to other cases in which the ranges and domains of the functions are
disjoint, so that there is no way of generating an infinity of elements.

	Circumscription is also collapsible when the predicates are
all monadic and there are no functions.

	We can then regard the process of deciding what facts to take
into account and then circumscribing as a process of compiling from
a slightly higher level non-monotonic language into mathematical
logic, especially first order logic.  We can also regard natural
language as higher level than logic.  However, as I shall discuss
elsewhere, natural language doesn't have an independent reasoning
process, because most natural language inferences involve suppressed
premisses which are not represented in natural language in the minds
of the people doing the reasoning.

	Reiter has pointed out, both informally and implicitly in
(Reiter 1982) that circumscription often translates directly into
Prolog program once it has been decided what facts to take into
account.

	2. Circumscription has interesting relations to Reiter's (1980)
logic of defaults.  In simple cases they give the same results.
However, a computer program using default logic would have to establish
the existence of models, perhaps by constructing them, in order to
determine that the sentences mentioned in default rules were consistent.
Such computations are not just selectively applying the rules of
inference of logic but are metamathematical.  At present this is
treated entirely informally, and I am not aware of any computer
program that finds models of sets of sentences or even
interacts with a user to find and verify such models.

	Circumscription works entirely within the logic as Appendices A
and B illustrate.  It can do this, because it uses second order logic to
import some of the model theory of first order formulas into the theory
itself.  Finding the right substitution for the predicate variables is, in
the cases we have examined, the same task as finding models of a first
order theory.  Putting everything into the logic itself is
an advantage as long as there is neither a good theory of how to construct
models nor programs that do it.

	Notice, however, that finding an interpretation of a language has
two parts - finding a domain and interpreting the predicate and
function letters by predicates and functions on the domain.  It
seems that the second is easier to import into second order logic
than the first.  This may be why our treatment of unique names is awkward.

	3. The axioms introduced in this paper are still not in
a suitable form for inclusion in a common sense database.  Some of the
circumscriptions have unwanted conclusions, e.g. that there are no
ostriches if none are explicitly mentioned.  Perhaps some of this
can be fixed by introducing the notion of present situation.  An
axiom that ostriches exist will do no harm if what is allowed to
vary includes only ostriches that are present.

	4. We are only part way to our goal of providing a formalism
in which a database of common sense knowledge can be expressed.
Besides sets of axioms involving ⊗ab, we need ways of specifying
what facts shall be taken into account and what functions and
predicates are to be taken as variable.

	5. Non-monotonic formalisms in general, and circumscription in
particular, have many as yet unrealized applications to formalizing
common sense knowledge and reasoning.  Since we have to think about
these matters in a new way, what the applications are and how to
realize them isn't immediately obvious.  Here are some suggestions.

	When we are searching for the "best" object of some kind,
we often jump to the conclusion that the best we have found so far
is the best.  This process can be represented as circumscribing
⊗better(x,candidate), where ⊗candidate is the best we have found
so far.  If we attempt this circumscription while including
certain information in our axiom ⊗A(better,P), where ⊗P represents
additional predicates being varied, we will succeed in showing
that there is nothing better only if this is consistent with
the information we take into account.  If the attempt to circumscribe
fails, we would like our reasoning program to use the failure
as an aid to finding a better object.  I don't know how hard
this would be.
.skip 2
%3Appendix A%1

Circumscription in a Proof Checker

	At present there are no reasoning or problem-solving programs
using circumscription.  A first step towards such a program involves
determining what kinds of reasoning are required to use circumscription
effectively.  As a step towards this we include in this and the following
appendix two proofs in EKL (Ketonen and Weening 1984),
 an interactive theorem prover for the
theory of types.  The first does the bird problem and the second
a simple unique names problem.  It will be seen that the proofs make
substantial use of EKL's ability to admit arguments in second order logic.

	Each EKL step begins with a command given by the user.  This is usually
followed by the sentence resulting from the step in a group of lines
each ending in a semicolon, but this is omitted
for definitions when the information is contained in the command.
We follow each step by a brief explanation.  Of course, the reader
may skip this proof if he is sufficiently clear about what steps
are involved.  However, I found that pushing the proof through EKL clarified
my ideas considerably as well as turning up bugs in my axioms.

.skip 1
.begin nofill
%7
1. (DEFINE A
     |∀AB FLIES.A(AB,FLIES)≡
                (∀X.¬AB(ASPECT1(X))⊃¬FLIES(X))∧(∀X.BIRD(X)⊃AB(ASPECT1(X)))∧
                (∀X.BIRD(X)∧¬AB(ASPECT2(X))⊃FLIES(X))∧
                (∀X.OSTRICH(X)⊃AB(ASPECT2(X)))∧
                (∀X.OSTRICH(X)∧¬AB(ASPECT3(X))⊃¬FLIES(X))| NIL)
This defines the second order predicate %2A(ab,flies)%*, where ⊗ab and
⊗flies are predicate variables.  Included here are the specific facts
about flying being taken into account.

;labels: SIMPINFO 
2. (AXIOM
    |(∀X Y.¬ASPECT1(X)=ASPECT2(Y))∧(∀X Y.¬ASPECT1(X)=ASPECT3(Y))∧
     (∀X Y.¬ASPECT2(X)=ASPECT3(Y))∧(∀X Y.ASPECT1(X)=ASPECT1(Y)≡X=Y)∧
     (∀X Y.ASPECT2(X)=ASPECT2(Y)≡X=Y)∧(∀X Y.ASPECT3(X)=ASPECT3(Y)≡X=Y)|)
These facts about the distinctness of aspects are used in step 20 only.
Since axiom 2 is labelled SIMPINFO, the EKL simplifier uses it
as appropriate when it is asked to simplify a formula.

3. (DEFINE A1
     |∀AB FLIES.A1(AB,FLIES)≡
                A(AB,FLIES)∧
                (∀AB1 FLIES1.A(AB1,FLIES1)∧(∀Z.AB1(Z)⊃AB(Z))⊃(∀Z.AB(Z)≡AB1(Z)))|
     NIL)
This is the circumscription formula itself.

4. (ASSUME |A1(AB,FLIES)|)
deps: (4)
Since EKL cannot be asked (yet) to do a circumscription, we assume the
result.  Most subsequent statements list line 4 as a dependency.
This is appropriate since circumscription is a rule of conjecture rather
than a rule of inference.

5. (DEFINE FLIES2 |∀X.FLIES2(X)≡BIRD(X)∧¬OSTRICH(X)| NIL)
This definition and the next say what we are going to substitute for
the bound predicate variables.

6. (DEFINE AB2
     |∀Z.AB2(Z)≡(∃X.BIRD(X)∧Z=ASPECT1(X))∨(∃X.OSTRICH(X)∧Z=ASPECT2(X))| 
NIL)
The fact that this definition is necessarily somewhat awkward makes
for some difficulty throughout the proof.

7. (RW 4 (OPEN A1))
A(AB,FLIES)∧(∀AB1 FLIES1.A(AB1,FLIES1)∧(∀Z.AB1(Z)⊃AB(Z))⊃(∀Z.AB(Z)≡AB1(Z)))
deps: (4)
This step merely expands out the circumscription formula.  RW stands
for "rewrite a line", in this case line 4.

8. (TRW |A(AB,FLIES)| (USE 7))
A(AB,FLIES)
deps: (4)
We separate the two conjuncts of 7 in this and the next step.

9. (TRW |∀AB1 FLIES1.A(AB1,FLIES1)∧(∀Z.AB1(Z)⊃AB(Z))⊃(∀Z.AB(Z)≡AB1(Z))| 
(USE 7))
∀AB1 FLIES1.A(AB1,FLIES1)∧(∀Z.AB1(Z)⊃AB(Z))⊃(∀Z.AB(Z)≡AB1(Z))
deps: (4)

10. (RW 8 (OPEN A))
(∀X.¬AB(ASPECT1(X))⊃¬FLIES(X))∧(∀X.BIRD(X)⊃AB(ASPECT1(X)))∧
(∀X.BIRD(X)∧¬AB(ASPECT2(X))⊃FLIES(X))∧(∀X.OSTRICH(X)⊃AB(ASPECT2(X)))∧
(∀X.OSTRICH(X)∧¬AB(ASPECT3(X))⊃¬FLIES(X))
deps: (4)
Expanding out the axiom using the definition ⊗a in step 1.

11. (ASSUME |AB2(Z)|)
deps: (11)
Our goal is step 15, but we need to assume its premiss and then
derive its conclusion.

12. (RW 11 (OPEN AB2))
(∃X.BIRD(X)∧Z=ASPECT1(X))∨(∃X.OSTRICH(X)∧Z=ASPECT2(X))
deps: (11)
We use the definition of %2ab2%*.

13. (DERIVE |AB(Z)| (12 10) NIL)
AB(Z)
deps: (4 11)
This is our first use of EKL's DERIVE command.  It is based on
the notion of direct proof of (Ketonen and Weyhrauch 1984).  Sometimes
it can do rather complicated things in one step.

14. (CI (11) 13 NIL)
AB2(Z)⊃AB(Z)
deps: (4)
We discharge the assumption 11 with the "conditional introduction" command.

15. (DERIVE |∀Z.AB2(Z)⊃AB(Z)| (14) NIL)
∀Z.AB2(Z)⊃AB(Z)
deps: (4)
Universal generalization.

16. (DERIVE
      |(∀X.¬AB2(ASPECT1(X))⊃¬FLIES2(X))∧(∀X.BIRD(X)⊃AB2(ASPECT1(X)))∧
       (∀X.BIRD(X)∧¬AB2(ASPECT2(X))⊃FLIES2(X))∧
       (∀X.OSTRICH(X)⊃AB2(ASPECT2(X)))∧
       (∀X.OSTRICH(X)∧¬AB2(ASPECT3(X))⊃¬FLIES2(X))| () (OPEN AB2 FLIES2))
;(∀X.¬AB2(ASPECT1(X))⊃¬FLIES2(X))∧(∀X.BIRD(X)⊃AB2(ASPECT1(X)))∧
;(∀X.BIRD(X)∧¬AB2(ASPECT2(X))⊃FLIES2(X))∧(∀X.OSTRICH(X)⊃AB2(ASPECT2(X)))∧
;(∀X.OSTRICH(X)∧¬AB2(ASPECT3(X))⊃¬FLIES2(X))
This is another rather lengthy computation, but it tells us that ab2 and
flies2 satisfy the axioms for ab and flies.

17. (UE ((AB.|AB2|) (FLIES.|FLIES2|)) 1 NIL)
;A(AB2,FLIES2)≡
;(∀X.¬AB2(ASPECT1(X))⊃¬FLIES2(X))∧(∀X.BIRD(X)⊃AB2(ASPECT1(X)))∧
;(∀X.BIRD(X)∧¬AB2(ASPECT2(X))⊃FLIES2(X))∧(∀X.OSTRICH(X)⊃AB2(ASPECT2(X)))∧
;(∀X.OSTRICH(X)∧¬AB2(ASPECT3(X))⊃¬FLIES2(X))
Now we substitute ⊗ab2 and ⊗flies2 in the definition of ⊗A and get
a result we can compare with step 16.

18. (RW 17 (USE 16))
;A(AB2,FLIES2)
We have shown that ⊗ab2 and ⊗flies2 satisfy ⊗A. 

19. (DERIVE |∀Z.AB(Z)≡AB2(Z)| (9 15 18) NIL)
;∀Z.AB(Z)≡AB2(Z)
;deps: (4)
9 was the circumscription formula, and 15 and 18 are its two premisses,
so we can now derive its conclusion.  Now we know exactly what entities
are abnormal.

20. (RW 8 ((USE 1 MODE: EXACT) ((USE 19 MODE: EXACT) (OPEN AB2))))
;(∀X.¬(∃X1.BIRD(X1)∧X=X1)⊃¬FLIES(X))∧
;(∀X.BIRD(X)∧¬(∃X2.OSTRICH(X2)∧X=X2)⊃FLIES(X))∧(∀X.OSTRICH(X)⊃¬FLIES(X))
;deps: (4)
We rewrite the axiom now that we know what's abnormal.  This gives
a somewhat awkward formula that nevertheless contains the desired
conclusion.  The occurrences of equality are left over from the
elimination of the aspects that used the axiom of step 2.

21. (DERIVE |∀X.FLIES(X)≡BIRD(X)∧¬OSTRICH(X)| (20) NIL)
;∀X.FLIES(X)≡BIRD(X)∧¬OSTRICH(X)
;deps: (4)
DERIVE straightens out 20 to put the conclusion in the desired form.
The result is still dependent on the assumption of the correctness of
the circumscription made in step 4.
%1

	Clearly if circumscription is to become a practical technique,
the reasoning has to become much more automatic.

.end
.skip 1
%3Appendix B%1

	Here is an annotated EKL proof that circumscribes the predicate
%2e(x,y)%1 discussed in section 6.

.skip 1
.begin nofill
%7
(proof unique_names)
What the user types is indicated by the numbered statements in lower
case.  What EKL types is preceded by semicolons at the beginning of
each line and is in upper case.  We omit EKL's type-out when it merely
repeats what the command asked it to do, as in the commands  DERIVE,
ASSUME  and DEFINE.

1. (axiom |index a = 1 ∧ index b = 2 ∧ index c = 3 ∧ index d = 4|)
Since EKL does not have attachments to determine the equivalence of names,
we establish a correspondence between the names in our domain and some
natural numbers.

2. (derive |¬(1=2) ∧ ¬(1=3) ∧ ¬(2=3) ∧ ¬(1=4) ∧ ¬(2=4) ∧ ¬(2=4)|)
EKL does know about the distinctness of natural numbers, so this
can be derived.

(der-slow)
We have to tell EKL to use the properties of equality rather than
regarding it as just another predicate symbol in order to do the next
step.  Sometimes this leads to combinatorial explosion.

3. (derive |a ≠ b| (1 2))
This shows that two names themselves are distinct.

4. (define equiv |∀e.equiv e ≡ (∀x.e(x,x)) ∧ (∀x y.e(x,y) ⊃ e(y,x))
	∧ (∀x y z.e(x,y) ∧ e(y,z) ⊃ e(x,z))|)
Here we use second order logic to define the notion of equivalence
relation.  The first word after "define" is the entity being defined
and included between vertical bars is the defining relation.  EKL
checks that an entity satisfying the relation exists.

5. (define ax |∀e.ax e ≡ e(a,b) ∧ equiv e|)
We define  ax  as a predicate we want our imitation equality to
satisfy.  We have chosen a very simple case, namely making  a  and
b  "equal" and nothing else.

6. (define ax1 |∀e.ax1 e ≡ ax e ∧ ∀e1.(ax e1 ∧ (∀x y.e1(x,y) ⊃ e(x,y))
	⊃ (∀x y.e(x,y) ≡ e1(x,y)))|)
This defines  ax1  as the second order predicate specifying the
circumscription of  ax.

7. (assume |ax1(e0)|)
(label circum)
We now specify that  e0  satisfies  ax1.  It takes till step 17
to determine what  e0  actually is.  When EKL includes circumscription
as an operator, we may be able to write something like  circumscribe(e0,ax1)
and make this step occur.  For now it's just an ordinary assumption.

8. (define e2 |∀x y.e2(x,y) ≡ (x = a ∧ y = b) ∨ (x = b ∧ y = a) ∨ x = y|)
The predicate  e2  defined here is what  e0  will turn out to be.

9. (derive |equiv e2| nil (open equiv) (open e2))
Now EKL agrees that  e2  is an equivalence relation.  This step takes
the KL-10 about 14 seconds.

10. (derive |ax e2| (9) (open ax) (open e2))
(label ax_e2)
Moreover it satisfies  ax.

11. (rw circum (open ax1))
;AX(E0)∧(∀E1.AX(E1)∧(∀X Y.E1(X,Y)⊃E0(X,Y))⊃(∀X Y.E0(X,Y)≡E1(X,Y)))
;deps: (CIRCUM)
A trivial step of expanding the definition of  ax1.  EKL tells us
that this fact depends on the assumption CIRCUM.  So do many of
the subsequent lines of the proof, but we omit it henceforth to
save space.

12. (trw |ax e0| (use 11))
;AX(E0)
The first conjunct of the previous step.

13. (rw 12 (open ax equiv))
(label fact1)
;E0(A,B)∧(∀X.E0(X,X))∧(∀X Y.E0(X,Y)⊃E0(Y,X))∧(∀X Y Z.E0(X,Y)∧E0(Y,Z)⊃E0(X,Z))
We expand  ax e0  according to the definitions of  ax  and  equiv.

14. (derive |∀p q r.(p ∨ q ⊃ r) ≡ (p ⊃ r) ∧ (q ⊃ r)|)
(label rewrite_by_cases)
This is a fact of propositional calculus used as a rewrite rule in the
next step.  A program that can use circumscription by itself will either
need to generate this step or systematically avoid the need for it.

15. (trw |e2(x,y) ⊃ e0(x,y)|
      ((open e2) (use rewrite_by_cases mode: always) (use fact1)))
;E2(X,Y)⊃E0(X,Y)
This is the least obvious step, because  rewrite_by_cases is used
after some preliminary transformation of the formula.

16. (derive |∀x y.e0(x,y) ≡ e2(x,y)| (ax_e2 11 15))
DERIVE is subsituting  e2  for the variable  e1 in step 11 and using
the fact  ax(e2)  and step 15 to infer the conclusion of the implication
that follows the quantifier ∀e.

17. (rw 16 (open E2))
;∀X Y.E0(X,Y)≡X=A∧Y=B∨X=B∧Y=A∨X=Y
;deps: (CIRCUM)
Expanding the definition of  e2  tells us the final result of circumscribing
e0(x,y).  A more complex  ax(e0) - see step 5 - would give a more complex
result upon circumscription.  However, it seems that the proof would be
similar.  Therefore, it could perhaps be made into some kind of macro.

.end
.skip 1
#. %3Acknowledgments%1

	I have had useful discussions with Matthew Ginsberg,
Benjamin Grosof, Vladimir Lifschitz and Leslie Pack.  The work was partially
supported by NSF and by DARPA.  I also thank Jussi Ketonen
for developing EKL and helping me with its use.  In particular
he greatly shortened the unique names proof.
.skip 1
#. %3References:%1

.single space
%3Etherington, David W. and Raymond Reiter (1983)%1: "On Inheritance
Hierarchies with Exceptions", in %2Proceedings of the National
Conference on Artificial Intelligence, AAAI-83%1, William Kaufman, Inc.

%3Ketonen, Jussi and Joseph S. Weening (1984)%1: %2EKL - An Interactive
Proof Checker, User's Reference Manual%1, Computer Science Department,
Stanford University.

%3Ketonen, Jussi and Richard W. Weyhrauch (1984)%1:
"A Decidable Fragment of Predicate Calculus",
 accepted for publication in the %2Journal for Theoretical Computer Science%1.

%3Lifschitz, Vladimir%1(1983): unpublished note on circumscription.

%3McCarthy, John and P.J. Hayes (1969)%1:  "Some Philosophical Problems from
the Standpoint of Artificial Intelligence", in D. Michie (ed), %2Machine
Intelligence 4%1, American Elsevier, New York, NY.

%3McCarthy, John (1977)%1:
"Epistemological Problems of Artificial Intelligence", %2Proceedings
of the Fifth International Joint Conference on Artificial 
Intelligence%1, M.I.T., Cambridge, Mass.

%3McCarthy, John (1979)%1: 
"First Order Theories of Individual Concepts and Propositions", 
in Michie, Donald (ed.) %2Machine Intelligence 9%1, (University of
Edinburgh Press, Edinburgh).
.<<aim 325,concep[e76,jmc]>>

%3McCarthy, John (1980)%1: 
"Circumscription - A Form of Non-Monotonic Reasoning", %2Artificial
Intelligence%1, Volume 13, Numbers 1,2, April.
.<<aim 334, circum.new[s79,jmc]>>

%3Minsky, Marvin (1982)%1: in various lectures including AAAI Presidential
Address though not in its written form.

%3Reiter, Raymond (1980)%1: "A Logic for Default Reasoning", %2Artificial
Intelligence%1, Volume 13, Numbers 1,2, April.

%3Reiter, Raymond (1980b)%1: "Equality and domain closure in first order
data bases", J. ACM, 27, Apr. 1980, 235-249.

%3Reiter, Raymond (1982)%1: "Circumscription Implies Predicate Completion
(Sometimes)", Proceedings of the National Conference on Artificial
Intelligence, AAAI-82.